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Abstract 

Compressed sensing deals with the recovery of sparse signals from linear measurements. Without any 
additional information, it is possible to recover an s-sparse signal using m > slog(d/s) measurements in 
a robust and stable way. Some applications provide additional information, such as on the location of the 
support of the signal. Using this information, it is conceivable the threshold amount of measurements can 
be lowered. A proposed algorithm for this task is weighted -minimization. Put shortly, one modifies 
standard £i-minimization by assigning different weights to different parts of the index set [1,... d]. The 
task of choosing the weights is however non-trivial. 

This paper provides a complete answer to the question of an optimal choice of the weights. In fact, 
it is shown that it is possible to directly calculate unique weights that are optimal in the sense that the 
threshold amount of measurements needed for exact recovery is minimized. The proof uses recent results 
about the connection between convex geometry and compressed sensing-type algorithms. 

Keywords: Compressed sensing, Gaussian matrices, weighted basis pursuit, convex geometry. 


1 Introduction 

At the heart of the theory of compressed sensing is the paradigm that it is possible to recover a sparse vector 
xq G using few linear measurements. A widely used method to perform this reconstruction is Basis 
Pursuit, i.e. to solve the minimization problem 

min||a::||^ subject to Ax = &. (Pi) 

A fundamental result is that m > slog d/s Gaussian random measurements are sufficient for Basis Pursuit 
to succeed at recovering an s-sparse vector with overwhelmingly high probability [2]. 

The main problem about recovering a signal in this setting is in fact to decide where its support supp Xq 
is situated. Indeed, if we knew the exact position of supp xg, we would be able to recover Xg by simply 
solving a set of linear equations which have more equations than unknowns, provided m > s. It is thus 
plausible that if we are given prior information about the location of supp Xg, it should be possible to use 
this information to lower the threshold amount of measurements needed to secure recovery. To be precise, 
let us assume that we know that a fraction a of the indices in some set S are in the support of xg. This 
can be interpreted as the probability for an z € S' to be an element of supp xg. We will subsequently denote 
T = supp Xg, for the sake of brevity. Intuitively, it seems that if a is high while p := |S| /d is not too large, 
we should be able to find the solution using less equations than if we did not have the initial guess. 


‘This work was supported by the Deutsche Forschungsgemeinschaft, DFG under Grant KU 1446/18-1. The author is with 
the Institut fiir Mathematik, Technische Universitat Berlin, Germany, (e-mail: fiinth@math.tu-berlin.de). 


(c)IEEE 2016. This paper has been published on IEEE Xplore, DOI 10.1109/tit. 2016.2569122 


1 





This exact situation was investigated in the paper [3]. Their proposed solution is to use a weighted 
£i-approach, namely to solve the minimization problem 

min ||x||;^ ^ subject to Ax = 6, {Pi) 

where IHli denotes the weighted £i-norm, given by 


d 

2=1 


I 1 , 11 ) 


should of course not be confused with the weak £i-norm.) We can likewise define the reweighted 


fp-norms for every p G [1, oo]; 


\\p,w 


|x(i)|^^ ,p<ooand ||x||^ = sup |x(*)| . 


Choosing the weights Wi is a subtle process - but the general idea is that the weights on the indices S 
should be lower. This will have the effect that high values x{i) on S are not penalized, which is desirable if 
T Ki S. Many papers have been concerned with the choice w = Isc (the setting corresponding to this choice 
is sometimes called modified Compressed Sensing [ID]), but the authors of |3] suggest that one should give 
oneself the freedom to choose w as wig + Igc for some w G [0,1]. Here, as well as in the following, 1 m refers 
to the indicator function of a set M, i.e. 


ImC*) 


1 Hi G M 
0 else. 


Tedious arguments by the authors of |3] showed that provided that the guess is good (a > .5), the 
required bound on the RIP-constant for the matrix A needed for robust recovery can be softened, and the 
stability constants get smaller, if one chooses w = 0. The formulas actually suggest that one either should 
use w = 0 or w = 1, and never something in between. Numerical experiments do not harmonize with this 
rule of thumb: in fact, if the guess is bad {a < .5), one does significantly better choosing w between 0 and 1 
than choosing one of the extreme values. The authors left it as an open problem to analyze how to choose 
the weight in an optimal way. 


1.1 Previous results on the optimal choice of weights. 

It is of course not entirely clear what “optimal" means in this context. One way of defining it is to say that a 
weight is optimal if the minimal amount of measurements needed in order for {Pf ) with b = Axq to recover 
Xq is as small as possible. A popular model assumption is that A is chosen according to some probability 
distribution. Because of its universality as a limit distribution, the Gaussian probability distribution is 
probably the most canonical one. In this setting, we search for the minimal amount of measurements 
required for Pf to succeed with high probability (the exact meaning of “high" varies from application to 
application). 

The first result concerning choosing w optimally in this sense was probably Theorem 4.3.3 of the PhD- 
thesis [TT] (the result was also presented in |5]). Based on a very sophisticated argument relying on calculating 
internal and external angles of certain polytopes, the authors implicitly provide a threshold mo(uj) so that 
if one uses m > mo(oj) Gaussian measurements, (Pf) will succeed with high probability. One can then 
minimize mo(uj) with respect to the weight oj. Since the formulas are very complicated, this is however a 
tough task. 

A result which is simpler to grasp is given in jSj. The strategy is basically the same. One computes 
a threshold depending on the weight uj, using the famous “escape through a mesh lemma" [I] and then 
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minimizes the threshold with respect to lo. Although the paper provides powerful and beautiful results on 
how the optimal threshold is depending on a and p, it does not directly provide any method of actually 
calculating the optimal weights. Furthermore, as the authors point out, the result is only about an upper 
bound on how many measurements one needs. 

A different analysis conducted in [7]. The authors define a weighted Null Space Property and prove 
several interesting result about that property. In particular, they calculate upper bounds on thresholds on 
the number of measurements needed to secure that Gaussian matrices have that property. The main technical 
tool is again Gordon’s escape through a mesh lemma. As a concrete weighting strategy, it is proposed to 
set the weight uj = 1 — a, since that weight minimizes their bound on the thresholds. One should note that 
the weighted Null Space property is uniform in nature in the sense that it is equivalent to that all signals 
supported on a set S will be recovered from its linear measurements through weighted .^i-minimization as 
long as the quality of the support estimate is good enough - its exact position does not matter. 

1.2 Contributions of this paper 

In this paper, we will consider the full model with k subsets Si of the complete index set [1,..., d], each one 
with a corresponding probability ai. Correspondingly, we will allow for k weights uji, one on each Si. We will 
use the short-hand notation uj = (wi ,... ,ujk). The strategy of finding the optimal weights is very similar to 
the ones of the papers mentioned above - we will calculate a threshold toq depending on w and then prove 
that mo(uj) is minimized for an optimal set of weights. To be precise, we will in fact not minimize mg itself, 
but only an upper and lower bound on mg. These bounds are however simultaneously minimized for the same 
oj* and also lie very close to each other. We will also investigate how the mentioned bounds on toq depend on 
the properties of the sets Si, i = 1,..., k. In particular, we will find the perhaps somewhat remarkable fact 
that one needs in principle as many measurements to recover an (oi |S'i| -I- • • • -I- Ofc |S'fc|)-sparse signal given 
the prior knowledge of the probabilities a = (oi ,... ,ak) and the sets (^i ,... Sk) as recovering k signals Xi 
separately, where for each i, Xi is ai jiSil-sparse and known to be supported on the set Si. This was already 
proven for upper bounds on the measurements needed in [5], but here, we provide also a statement about 
lower bounds. 

In order to do this, we will use a powerful and recent result from [T], namely that the recovery probability 
of a convex program undergoes a phase transition as the number of measurements surpass the statistical 
dimension 5{C) of a certain convex cone. We will provide a way of calculating a set of weights w* which 
simultaneously minimizes a tight lower and upper bound of 5{C), and hence to some extent settle the 
problem regarding the optimal choice of weights when recovering a sparse vector Xg with prior information. 
Additionally, we will even provide a simple analytical formula for oj*. 

The fact that we are using the new notion of statistical dimensions is the main technical difference from 
the prior work mentioned in the previous section, where the escape through a lemma was the main tool in 
most cases. 

The notation in the paper will be fairly standard. There are however a few things that should be pointed 
out. For a vector w € diag(w) is the diagonal matrix in whose diagonal is w. pos(x) is the positive 
part of a real number, i.e. pos(a;) = a; if a; > 0, and otherwise pos(a;) = 0. For x € sgn x denotes the 
vector consisting of the signs of the entries of x, with the convention sgn 0 = 0. Finally, liSI will have different 
meanings depending on its argument. If S' is a real number, |S| denotes its absolute value. For vectors, |S| 
is the vector formed by taking the absolute value pointwise. If S is a set, |S| denotes the number of elements 
of S. 


2 Statistical dimensions and the measurement threshold 

Let us begin by fixing the model situation and notation. a;o G K"’* is an s-sparse vector, meaning that only 
s of the entries xg(i), i = 1,... d are unequal to zero. Additional to the prior information that xg is sparse, 
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we are given some partial information where the support T of xq is situated, expressed by the partition 
S = of [1,... d]. With each set Si we associate the two parameters ai and pi 


Pi = —r and at = 
a 


|Tn5,| 


reflecting its size and the probability for a j S S'i to be an element in T, i = 1,..., d, respectively. To each 
Si, we will assign a weight > 0 and always choose the weights according to 


k 

w = 

i=l 


We will without loss of generality assume that > 0 for every i (if = 0 for some i, the corresponding set 
of indices Si can be completely ignored in the recovery process.) 

Now we define the object which we will search for in the rest of this section: the threshold amounts of 
measurements needed for the program (P™) to be successful. 

Definition 1. Let A G be a Gaussian matrix. Then p,s,d denotes the normalized threshold amount 

of measurement needed for (Pi) with Axq = b to succeed at recovering an s-sparse vector xq, i-e., (Pi) will 
succeed with high probability if m> d{fj,d,s + o(l))) o,nd will fail with high probability if m < d{p,d,s — o(l))- 

In the exact same way, we define for a given support T and partition S = (S'i)^^i the number ms,T,w o,s 
the threshold amount of measurements for (P™) with Axq = b to succeed at recovering a vector Xq supported 
on T. Finally, ms.T is the value obtained by choosing the weights w optimally, i.e. ms,T = iiifi«>o W 5 ,r,u)- 

Note that we are only concerned with exact recovery of exactly sparse signals, and, in particular, leave out 
cases where noise is involved and the signals are only approximately sparse. We believe that the requirement 
of stability and robustness will not significantly increase the measurement threshold, but also think that a 
thorough analysis of these questions are beyond the scope of this paper. Therefore they are left as possible 
lines of future research. 

The authors of [5] provided upper bounds fjs.d and f/s.T of Ps.d and ms.T, respectively. (Note that our 
notation is a bit different to theirs: what we call f]s,T is denoted 77 ( 0 :,/3) by them, where a and /3 denotes 
l^i n r| and IS '2 n r|, respectively.) They then proceed to prove the following very beautiful result. 

Theorem 2. Th. 3.1] We have 


f?(Si,S2),T = Plf7ai|Si|.|Si| + P2'ria2\S2\,\S2\ 

Theorem[^in fact states that in order to hnd an Q;ipi(i + a 2 P 2 d-sparse vector using the prior information, 
we need about as many measurements as the sum of the measurements needed to separately recover two 
OiPid-sparse vectors known to be supported on Si, respectively. The authors of [8] conjectured that the 
theorem can be generalized to more than k = 2 sets Si. We will do this at the end of the paper, with the 
addition that the upper bound we provide on ^ in fact also induces a lower bound. 

p,s,d and ms,T,w can be calculated using a few results from [T]. Before stating the results we will use, let 
us define the statistical dimension of a cone C. 

Definition 3. |I]/ Let C C be a closed convex cone and g be a Gaussian vector. Let furthermore He 
denote the metric projection onto C, i.e. 

nc(a:) = argmin^gc' l|c - 3^112 ■ 

The statistical dimension 5{C) of C is then defined as 

6{C)=E{\\ncg\\l). 

The statistical dimension of a general convex cone is the statistical dimension of its closure. 
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The main result of [1] shows that the statistical dimension of the descent cone T>{f,xo) of a convex 
function f a,t Xq, i.e., 

= {u e 3 r > 0 : f{xo + Tv) < /{xq)} , 

can be used to calculate the minimal amount of measurements needed for the program 

min/(x) subject to Ax = b (Pf) 

with b = Axq to succeed at recovering a vector xq It is well known that {Pf) has xq as its solution if 

and only if 'D{f, xq) H ker A = {0}. This leads to the following precise result: 

Theorem 4. |IJ Th. I] Let C C be a convex cone and A G a Gaussian matrix. Then if 

5{C) + \/8log{‘i/ri)'/d < m, 

we have P(Cn ker^ ^ {0}) < r]. Likewise if 

S{C) - ^/8log{A/r|)Vd > m, 


then P (C n ker A {0}) > 1 — rj. 

In particular, the probability that the solution of the problem {Pf) is equal to xq undergoes a phase 
transition as m surpasses S{'D{f,xo)), and hence we have 


l^s,d — 




and ms,T,v 


d 


The authors of [T] proceeded to calculate 5(I?(||-|]^ ,a:o)) as an exemplary application of a technique they 
presented. We will use the same technique to calculate i5(l?(||-||^ ^ ,a;o)), but before that, let us state the 
result regarding standard Basis Pursuit. 

Theorem 5. m Prop. 4.5]. Define the function J^. through 


Ja{r) = cr(l + T^) + (1 - (y)ip{T), 


where ip : [0,1] —>■ [0,1] is defined through 


<p{t) 



exp{—x'^/2)dx. 


The statistical dimension of the descent cone of the ii-norm at an s-sparse xq G satisfies 


( 1 ) 


inf Jcr{T) 

T>0 


2 [1 ^ S{V{\\-\\,,xo)) 
d\la- d 


< inf J^(r), 

r>0 


where <j = s/d. 

Now as promised, we will prove a similar result regarding the weighted Basis Pursuit {Pi^w)- We will 
use the same technique as the authors of [1] and therefore start by calculating the expected length of 
the projection of a Gaussian vector onto a dilation of the subdifferential of the weighted £i-norm. The 
subdifferential 0 p.214-15] of a convex function f at a point xq is the set of vectors p satisfying 

f{x 3- x) > f{xo) + {p, v) for all v G 


For this, we will need a fact on subdifferentials of norms. The following result is folklore in this area, but let 
us include a proof for completeness. 
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Proposition 6. For any norm H-H on denote by Ij-H^ its dual norm, i.e. 

\\pL= sup {p,v). 

\M=i 

Then for every xq ^ 0 

5.olMI = {pl ML = i, (p,^o) = l|xo||}, 

where {p,Xq) denotes the canonical dual pairing x i.e., the Euclidean scalar product. 

Proof. We aim to characterize the vectors p G which have the property 

||a:^o +u|| > ||xo|| + {p,v) for all v G (2) 

By taking the supremum in (§ over all v with ||n|| = 1, we immediately obtain 

sup {p,v) < sup ||xo + u|| - lixoll < sup ||u|| = 1, 
lhll=i lhll=i lhll=i 

be. IIpII^ < 1. By further choosing v = —Xq, we get (p,X q) > ||a;o||, which proves ||p||^ > 1, and therefore 
also IIpII^ = 1. This further implies that (p,xo) = ||a;o||- If on the other hand ||p||^ = 1 and (p,xo) = ||a:o||, 
we have 

l|a;o|| + {p,v) = {p,xo + v) < ||xo + u|| , 

and hence is true, and we are done. □ 

Now we can calculate E ^dist^(p,T • d^o IMIi for xq G and a Gaussian vector g. 

Lemma 7. Let xq G be arbitrary and g a Gaussian vector. We have 

k 

E (d~^dist^{g, r • = cr+ Y1 Pdaii^Jirf + (1 - ai)ip{uj^T)), 

where a = ||xo||q /d, w = ^i^Si o.nd ip is defined through Equation 0. 

Proof. Let us start by noticing that the dual norm of IHli is H-H^ ^-i. The Holder inequality implies that 
for every x, u G , 


d 

{x,v) ^ ||K"^a^i)Ll||oo 

i=i 


1,W ’ 


be. ||a;||i uj ^ ll^^lloo ju-i- Hy choosing v equal to the vector for which v{i) = sgn x(i) if i is the index where 
w{i)~^x{i) is maximal, and else v{i) = 0, we even see that equality holds. 

This, together with the fact that under the assumption ||p||^-i oo = I ’ i^o^P) = ll^^olli u, if and only if 
p|supp Xq — L) Sgn Xolsupp Xq , implics that 


9x0 = {^'“sgn Xo + u 


supp V n supp Xq 


0,Vj : \vj\ < Wjj , 


where Z?™ 


diag(u>) = diag ■ Abbreviating T 


supp Xq, we see that 


disr{g,T ■ da,Q = '^{g{j) - sgn{xo{j))wjTf + Y^pos{\g{j)\ - Wjxf 

JGT j^T 
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(the part on T of a vector in rdxg IMIi m is fixed to rZ?“sgn xq, and the other part can elementwise not be 
larger than w.) Taking the expectation, we obtain 


E(dist^(5,T-9,,JH|^_^)) =^(l + w|T2) + ^y^ f {x-WjTfexp{-x^/2)d: 

jGT j^T ^ ^ 


= \T\+J2 1-5* n T\ t'^wI + \S, n T^\ 

=d CT + ^ Pj (ai(w,T)^ + (1 - ai)ip(uJiT)) . 


□ 


For convinence, let us abbreviate 

k 

Jcr,a,p,u>{T) = CT + ^ p^{ai{uJiT)'^ + (1 “ ai)(p{uJiT)). 
i=l 


Also define 


'ms^T,w 


iaf J a^a^p^uj 


(3) 


where a and p are the parameters associated with S and T, and w = 

As the notation suggests, ms,T,w is not far away from ms,T,w We will prove this using Lemma [^together 
with the following theorem from [T]. 

Theorem 8. JJl Th. 4-3]■ Let II’ll he a norm on The the statistical dimension of the descent cone of 
II'll at a point Xq S satisfies 


0 < inf E [dist^{g,Tdxo IMD) - S(V{\\-\\ ,xo)) < 

T>0 


2suPss0.ol|-|| 11^112 



Xq 



Ikolla 



Proposition 9. The statistical dimension o/I?(||-||j^ ^ ,xo) satisfies 


(4) 


1Tls,T,w 


2 

d\ a ~ d 


(5) 


where a = 


mini Oi and rhs^T,w is defined through (|^. Moreover, the following bound independent of a holds: 


ms,T,w 





< 'dls,T,w 


( 6 ) 


Proof. We use Theorem (|^. To control the error term, by using that |S'i n T| = aiPid for each i, we obtain 
the estimate 


J6T 



\ 


i=l 




which is valid for any x supported on T. We even have equality if we choose |x| parallel to rc|so. We can 
do this since, as pointed out in [T], there is no need to use Theorem ^ directly for xq- We only have to 
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secure that x has the same 2?(|i-||i^u, ,x) and dx as xq, since infT->o J<j,a,p,ui{T) - i5(2?(|i'lli ^^o)) only 

depends on those entities. It is clear from the considerations in the proof of Lemma that for this to be 
true, X only has to be supported on T and have the same sign pattern as xg- It is equally clear that there 
exist such vectors x so that |a:| is parallel to w\t- 

This proves that it is possible to choose the denominator of the error term in Theorem (|^ equal to 
\j^ 12i=i To bound the numerator, it suffices to notice that each vector s G dxg |Mli satisfies 




E^' = 


EE-.^ = 


\ i=l 


\ J=1 \ i=i \ tes, 

since |sj| < Wj for each j, and |S'i| = Pid. To finish the proof of the tighter bound © , we note that 

sfdY!Up^l ~ N ^ 

Finally, the bound Q easily follows from that statement, since 

da = mind—r^^-;— > 1, 


l^.l 


using the facts that |S'i n T| > 1 and 15^1 < d. 


□ 


3 Optimizing ms,T,w‘ 


Proposition|^states that ms,T,w is bounded both below and above by the number rhs,T,w = infr>o J(7,a,p,w(j), 
up to an error which is independent of the weights w. In particular, if we denote fhspr = infi«>o '>dis,T,w, we 
have 


ms,T 


2 

Vd 


^ ^S,T ^ 'fns,T- 


Hence, fhs^T is equal to ms^r up to an asymptotically vanishing error term. In this section, we prove that 
'nis,T,w has a unique minimum at some ui* for every S. As the proof will show, it can furthermore easily 
be found by minimizing a convex function in two variables. The proof uses several techniques exploited for 
partly other purposes in [1]. 

Theorem 10. Assume that |T| < d and let xq G be a vector supported on T. For each partition 
S = {Si)i-i of [1,, d], there exists weights uj* G so that 


ms,T,uj = inf E {dis^{g,Tdx„ 


is minimal, where w = The weights uj* are for instance uniquely determined by the additional 

constraint ||w*||^ = 1. 

Proof. First, we note that the chain rule proves that 


IMIi,. = {dD-.o- IMIi) = (d.o IMIi) =: 

where we used D'^* = and the fact that the subdifferential dx IHli only depends in the sign pattern of 
X . Hence, since Xg and D'^xg have the same sign pattern, do'^xo IMIi = dxg IMIi- If for u G M+, we denote 
A" = diag(^^^;^ UilsJ, then tA‘^ = A'^‘^ and hence 

inf inf E (disM(g,Ta^o IMIi „,)) = inf E (disM(g, A^C)) . 

a;>0T>0 \ ’ / f>0 
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Therefore, if we prove that there exists a unique v* 7 ^ 0 so that E (dist^((/, A^'C')) is minimal, then uj* = 
u*/ ||u*||^ is the optimal weight we are looking for, since then 

inf E(dist^(5,Ta,,„ = inf E fdist 2 ( 5 , = E fdist2(g, A*'*C')) < E (dist2(g, A^C)) 

for any r > 0 , w > 0 , from which the optimality of w* follows, w* of course denotes X]?=i 

In order to prove that there exists a unique minimizer of (j){v) = E (dist^(g, A^C)) on it suffices to 
prove that </> is strictly convex, continuous and coercive (“large outside a ball“). 

Strict convexity: Let Si € A'"‘C, i = 1,2, and 6 G [0,1]. Then (due to the structure of C = d^g IMIi), we 
know that Si = A’'*(sgn xq + Ui) for some Ui with supp Ui n supp xq = % and ||Mi||go < 1- This implies that 

Osi + (1 - e)s 2 ={9A'^^ + (1 - 6 »)A"^)sgn xo + {OA'^^ui + (1 - e)A'’^u 2 ). 

The right hand side of the previous equation is an element of This since ^A^^ + (1 — 9)A'"^ = 

^ 0 vi+{i- 0 )v 2 ^ the action of diagonal matrices does not change the support of a vector, and the chain of 
inequalities 

\{9A^^ui + (1 - 9)A^-u 2, e,)\<9\{A^^ui,e,}\ + (1 - 9)\{A^-U2,e,)\ 

= 9 |ui(t)| ?;i(t) + (1 - 0) |M 2 (i)| V 2 {i) < 9vi{i) + (1 - 9)v2{i). 

From the latter we deduce that 9A^^ui + (1 — 9)A^^U2 can be written as where supp u n 

supp xq = 0 and \\u\\^ < 1. We have thus proven that for every triple vi, V 2 € 9 G [0,1], 

9A'^^C + (1 - 9)A'^^C C 

Therefore, for each fixed u G Si G A'"^C, i = 1,2, we have 

dist(u, 6 »A"iC + (1 - 9)A'"^C) < ||u - ( 6 >si + (1 - 6 »)s 2 )|l 2 

< 6 i||w-si ||2 + (l- 6 »)||u-S 2||2 

Taking the infimum over si and S 2 , it follows that for each fixed u, the function v —>■ dist(u, A'"C) is convex. 
The square of a non-negative convex function is still convex, and since choosing u = g a Gaussian and taking 
the expectation also does not destroy the convexity, and we can conclude that <j) is convex. 

It remains to prove strict convexity. Since convexity already has been established, it suffices to prove 
that there does not exist vi yf V 2 and 9 G (0,1) with (j){9vi + (1 — 9)v2) = 9(j){vi) + (1 — 9)(j){v2)- Towards a 
contradiction, assume that this is not true. Then 

E(dist 2 (g, lsev,+(i-s> 2 c^'^ ^ E ( 6 >dist 2 ((?, A^'^C') -f (1 - 6 >)dist( 5 , A"^C)) . 

According to what was just proven, the expression over which we are taking the expectation on the right hand 
side is not smaller than the expression on the left hand side. Hence, in order for equality to be true, the two 
expressions have to be equal almost surely. But there exists some u G which does not lie in 

A^'^C' (since vi yf 9vi + (1 — 9)v2)- For this vector, strict equality must hold, since dist^(u, = 

0 < dist^(M, A^'^C'). Now, since the distance from a vector m to a convex set is continuously dependent on 
u, the strict inequality even holds in some e-ball around u, which has positive Gaussian measure. The two 
expressions are hence not almost surely equal. This is a contradiction, and hence (j) is strictly convex. 
Continuity: We have, for m G fixed and each p € C 

dist(M, A"C) < ||t 6 — A’'p ||2 < ||m — A^pII^ -I- ~ 

pec 
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|dist(u, A^C) — dist(M, A'"(7)1 < sup ||(A" — A")p|| < sup llA’' ''pll 

pec pec 


d 

= sup . V(u(i) - {;(l))2p(i)2 < lit; - v\\^ . 


Denote dist((;, A''(7) = r„ and dist(g, A’'(7) = Vy, respectively. The inequality just proved then reads 
\rv — 'f’y\ < II?; — ?)||2 and we may estimate 

\(t>{v) - (l){v)\ = |E((r^ - ry){ry + ry))\ < (^{vy + ryf'^ 

< llu - ?;||2 ^2E (r2 + r?) = ||u - vW^ a/2(^M+"^M)- 


4> is furthermore locally bounded, since <j){v) = E (dist^(g, A’^(7)) < 2E ^||5||^ + sup^g<^ ||D’'s||2^ = 2+2 ||u||2. 
Hence, (j) is continuous. 

Coercivity: We have to prove that ||?;||2 —> oo implies (j){v) —> oo. Let us first note that due to our global 
assumption > 0 for each ?, there exists for every i = 1,..., fc an index ji G Si with ji G supp xq. For each 
such index and each p G C, we have due to p|supp xg = sgn xq 


inf 

pGC 


iA>ii; 


> inf V 1 
pec 21-' 
1—1 




k 

i=l 


That implies that for any given iF > 0, there exists an i? > 0 so that ||?;||2 > i? +> infpgc l|A"p||2 > K- 
Hence, if ||?;|| > R and ||ii|| < r < K, we have 


dist^(M, A'^C) > pos( inf || A’'p||2 — ||m ||2 
pG c 

> {K-rf. 


This implies that if ||?;||2 > R: 

E (dist2(5, A"C))' > E (l{||g||<,}dist2(g, A^C))' 
>P(|| 5 || <r) {K-r)\ 


Since P(|l5|| < r) > 0, this proves the claim. 

Now it remains to prove that the minimum is not attained in w = 0. For this, consider choosing all 
weights equal to A > 0, i.e uj = A(l,..., 1). Lemmaapplied for Ui = \ for all i yields 

^(A(l, . . . , 1)) = cr + A^cr + (1 - cr)+(A). 


We may estimate +(A) by 

+(A) = J {x — A)^ exp(—x^/2)dx < J 


[x^ — 2x\ + A^) exp(—a;^/2)(ia; = ( 1 + A^ — A 


Therefore 


(/)(A(1,..., 1)) — cr + A 0' + (l — o')(p(A) <(t + A(7 + (1 — (t)| 1 + A — Ay — j < 1 — </’(0) 


for small values of A, since cr < 1 by assumption. 


□ 
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Having established an abstract result on the existence of optimal weights w, we now take a step back to 
see that the proof of said theorem actually provides a way to concretely calculate them - we only need to 
minimize the function (^(u) = E (dist^(( 7 , A^C)). Writing v = toj as in the proof, we obtain, with the help 
of Lemma [7] 

k 

4>{v) = CT + ^ Pj {aivj + (1 - ai)(p{vi)) . 

i=l 


Since (j) has the structure of a sum of convex functions, it can further be minimized by considering one 
variable at a time. One could argue that this would have been a much simpler way of proving the assertion 
of Theorem but the approach used in that proof has more potential to be applied to more general 
problems, and hence has an interest of its own. 

Since the convex functions are even differentiable, it is not hard to write down a formula for calculating 
uji = Vi analytically. As a matter of fact, this has already been carried out in [H Prop 4.5]. There, the 
formula was used in the process of calculating the statistical dimension of I?(||-|| , xq). In particular, it was 
not recognized as an optimal choice of a weight uJi. For completeness, we include the statement. 


Corollary 11. 


The optimal weights uj* described in Theorem 10 are given through the equations 


a^uj* = (1 



oji) exp{—x^/2)dx, 


i = 1.. .k. 


In particular, uj* is independent of p. 

Finally, we can extract one more interesting corollary, which in fact is a counterpart of Theorem in this 
setting. Using the fact that a = one can see that 


k 

4'{v) = (ai(l + vf) + (1 - ai)ip{vi)) . 

i=l 


Noticing that the term in the sum is actually the function we need to minimize in order to find Pai|Si|,|Si|> 
we immediately arrive at the following result, which also summarizes the entirety of this paper. 

Theorem 12. Let T C [1,..., d] with |r| = s < d, and a partition S = {Si)^^i of [1,... d] be given. Then 
there exist weights uj* which minimize rhs,T,w and which are unique up to multiplication with a positive 
scalar. Furthermore, we have 


k 

2=1 


and the same also for the lower bounds ns,T = nis.T — 2l\fd. 


4 Numerical experiments 

In this section, we present the results of some numerical experiments investigating the practical performance 
of the weighting strategy described above. Before presenting the set up of the experiment, let us note that 
it is numerically relatively easy to calculate the optimal weights described in Corollary El we simply have 
to solve the k independent, one-dimensional equations 

atu}* = {1 - J (x - Wi) exp(-x^/2), i = l...k 
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Figure 1: Plot of the optimal weight wi ( 0 J 2 is always equal to 1) for p = (0.1,0.9) for different Oi compared 
with 1 — ai- 


for uji. For this, we used the MATLAB routine fzero. To be able to compare these weights to other strate¬ 
gies, we furthermore normalized them so that ||w||^ = 1. In a similar manner, one can explicitly write down 
the optimality condition for J'cr^a,p,uii'’') = 0 to calculate the thresholds induced by given weights uj. 


In the first set of experiments, we consider the case of two sets 5'i, S' 2 . We fix the ambient dimension 
to d = 100 and choose random 10-sparse signals - the support of Xq is drawn at random, and the values 
of Xq on the support are drawn from the standard multivariate normal distribution. Then we draw 3, 5 or 
7, respectively, indices in the support together with 7, 5 or 3, respectively, indices outside the support to 
form a group Si. The rest of the indices is then called 32- Hence, in each experiment, p\ = a = 0.1 and 
ai = 0.3, 0.5 and 0.7, respectively. 

We consider 4 strategies: 


1. w = (1,1), which corresponds to standard ^i-minimization, 

2. the extreme strategy a; = (0,1), 

3. w = (1 — 0 ( 1 ), 1), i.e. the one from [7], 

4. the one with weights calculated as proposed in this work. 


The resulting weights for strategy (4) are depicted in Figure for different values of a. The theoretical 
thresholds infT>o J(T,p,a,u){T) were also calculated and are depicted in Figure]^ As can be seen, the statistical 
dimensions for the our choice are in all cases lower compared to the other approaches. The strategy from [7] 
is however very close to the one described in this paper. 


To test the performance in practice, we for each m S [1,... ,35] draw a matrix A S 


pm.d 


according to 


the Gaussian distribution and solve the reweighted £i-minimization problem {Pi) with help of the matlab 
package cvx [5]. For each m, 1000 experiments are performed, and a success is declared if the solution of the 
minimization problem differs no more than 0.001 in .^ 2 "iiorm from xq. The results can be found in Figure]^ 
The figures show that reweighted £i-minimization performs significantly better using the weight chosen 
as in Theorem 10 compared to standard Basis Pursuit (w = 1) for all a tested. The same is true for the 
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Figure 2: The threshold amount of Gaussian measurements needed to secure recovery with weighted l\- 
minimization for different weighting strategies and p = (0.1,0.9), depending on ai. 

comparison to the strategy a; = 0 in the cases a = .3 and a = .5, but not in the case ai = 0.7. This 
was expected, since experiments performed by the authors of [3] indicated that the choice of to gets more 
important as a gets lower. The difference between our strategy and the one proposed in [7] is also present, 
however not significant. 

In order to see that also the theory for k > 2 is practically relevant, we make another experiment and 
consider the case when a = (4/5, 3/10, 2/15,1/70) and p = (.05, .1, .15, .7). A is chosen as above for different 
values of m. We consider five different strategies 

1. Set w = 1 (i.e. perform standard ^i-minimization. 

2. Consider the union S'! U S '2 U S 3 as one region with p = .3 and a = 9/30, and choose the two weights 
as w = (1 — a, 1 ). 

3. Consider the union S'! U S '2 U S 3 as one region with p = .3 and a = 9/30, and choose the two weights 
as proposed in this work. 

4. Choose four weights, one for each Si, with W 4 = 1 and cui = 1 — Oi for i = 1,... 3. 

5. Choose four weights as proposed in this work. 

Note that although strategy (4) bare resemblances to the one proposed in [7], there are no theoretical 
results which motivate why it should be used. In particular, it was not proposed by the authors of that 
paper, since they only considered the case k = 2. It should only be seen as a heuristic choice for comparison 
purposes. The results are depicted in Figure We see that the optimal strategy proposed in this work 
involving four sets Si is perform significantly better than all of the other strategies. 
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Figure 3: Numerical experiments concerning the ef¬ 
fect of optimally chooosing w. In each hgure, d = 100, 
a = Pi = 0.1. ai is different in each figure - 0.3 in 
the upper left figure, 0.5 in the upper right figure and 
0.7 in the lower figure, respectively. The calculated 
optimal weight uji (uj 2 is always equal to 1) in the re¬ 
spective experiments are equal to 0.5539 (ai = 0.3), 
0.3208 (ai = 0.5) and 0.1599 (ai = 0.7). The error 
bars are corresponding to 3 standard deviations. 



Figure 4: Numerical experiment concerning the ef¬ 
fect of using of a hner partition Si, S 2 , S 3 , S 4 instead 
of forming two sets S'! U ^2 U S 3 and S 4 . The spar¬ 
sity parameter is cr = 0.1, p = (0.05,0.1,0.15,0.7) 
and the ambient dimension is d = 100. The 
optimal weights were in the hrst case calculated 
to (0.0884,0.3742,0.5617,1) and in the second case 
(.3742,1). The error bars are corresponding to 3 stan¬ 
dard deviations. 
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